Show the code
source("~/publicdataprojects/scripts/source.R")Henrik Vitus Bering Laursen
September 25, 2025
I wanted to walk through how to use R shiny and put in some publicly available Danish statistics as the basis for a dashboard.
I want to do the following:
R shiny in the simplest possible way
So here we go.
First I sign in to Statistics Denmark website, after creating a user (necessary for bigger downloads).
Then, I pick out a type of data I want to look at. There are so many. Because of my interest in healthcare I look at the table for hospital utilization, SBR04.
From that table I select all variables. I will probably only use very few, but I do it just in case I use more of it in the future.
I will then:
R shiny around what would be interesting to look at given the dataR shiny app. Probably actually publish it online, while remembering to cite Statistics Denmark as the source── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
The process of finding data from Statistics Denmark currently comes in these steps:
Rows: 539 Columns: 1
Error in nchar(x, "width"): invalid multibyte string, element 1
Odd. I didnt expect that error. Let’s see whats up.
[1] "\"K\xf8n i alt\";\"Alder i alt\";\"Uanset sygehusv\xe6sen\";\"Personer med ophold (antal)\";2788101.0;2876145.0;2900078.0;2793650.0;2838793.0;2919830.0;2962563.0;2957759.0"
[2] "\"K\xf8n i alt\";\"Alder i alt\";\"Uanset sygehusv\xe6sen\";\"Personer med ophold (pct.)\";48.2;49.5;49.8;47.8;48.3;49.2;49.7;49.4"
[3] "\"K\xf8n i alt\";\"Alder i alt\";\"Uanset sygehusv\xe6sen\";\"Ophold per person (antal)\";2.1;2.2;2.3;2.1;2.2;2.2;2.2;2.2"
[4] "\"K\xf8n i alt\";\"Alder i alt\";\"Uanset sygehusv\xe6sen\";\"Personer med ophold p\xe5 under 12 timer (antal)\";2713543.0;2804125.0;2831454.0;2725266.0;2769264.0;2851193.0;2895701.0;2890614.0"
[5] "\"K\xf8n i alt\";\"Alder i alt\";\"Uanset sygehusv\xe6sen\";\"Personer med ophold p\xe5 under 12 timer (pct.)\";46.9;48.3;48.6;46.7;47.1;48.1;48.6;48.2"
[6] "\"K\xf8n i alt\";\"Alder i alt\";\"Uanset sygehusv\xe6sen\";\"Ophold p\xe5 under 12 timer per person (antal)\";2.0;2.0;2.2;2.0;2.0;2.1;2.1;2.1"
Ok this may need a different reading function.
Rows: 540 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ";"
chr (4): X1, X2, X3, X4
dbl (8): X5, X6, X7, X8, X9, X10, X11, X12
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 6 × 12
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Køn i alt Alder … Uans… Pers… 2.79e6 2.88e6 2.90e6 2.79e6 2.84e6 2.92e6 2.96e6
2 Køn i alt Alder … Uans… Pers… 4.82e1 4.95e1 4.98e1 4.78e1 4.83e1 4.92e1 4.97e1
3 Køn i alt Alder … Uans… Opho… 2.1 e0 2.2 e0 2.3 e0 2.1 e0 2.2 e0 2.2 e0 2.2 e0
4 Køn i alt Alder … Uans… Pers… 2.71e6 2.80e6 2.83e6 2.73e6 2.77e6 2.85e6 2.90e6
5 Køn i alt Alder … Uans… Pers… 4.69e1 4.83e1 4.86e1 4.67e1 4.71e1 4.81e1 4.86e1
6 Køn i alt Alder … Uans… Opho… 2 e0 2 e0 2.2 e0 2 e0 2 e0 2.1 e0 2.1 e0
# ℹ 1 more variable: X12 <dbl>
Alright. I expected comma, but it was semicolons. And some googling revealed “ISO-8859-1” as the danish encoding.
Now, the structure seems to need some cleaning, with several repeating values. The columns seem to be “supercolumns” with each column from the left to the right being “beneath” in the level of grouping of the previous.
First, let’s replace those X column names with their corresponding actual names.
# A tibble: 6 × 12
sex age sector measure `2017` `2018` `2019` `2020` `2021` `2022` `2023`
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Køn i a… Alde… Uanse… Person… 2.79e6 2.88e6 2.90e6 2.79e6 2.84e6 2.92e6 2.96e6
2 Køn i a… Alde… Uanse… Person… 4.82e1 4.95e1 4.98e1 4.78e1 4.83e1 4.92e1 4.97e1
3 Køn i a… Alde… Uanse… Ophold… 2.1 e0 2.2 e0 2.3 e0 2.1 e0 2.2 e0 2.2 e0 2.2 e0
4 Køn i a… Alde… Uanse… Person… 2.71e6 2.80e6 2.83e6 2.73e6 2.77e6 2.85e6 2.90e6
5 Køn i a… Alde… Uanse… Person… 4.69e1 4.83e1 4.86e1 4.67e1 4.71e1 4.81e1 4.86e1
6 Køn i a… Alde… Uanse… Ophold… 2 e0 2 e0 2.2 e0 2 e0 2 e0 2.1 e0 2.1 e0
# ℹ 1 more variable: `2024` <dbl>
The first column, Sex, is “Køn i alt” which is both sexes. I want to see the difference in sexes so I will remove, or filter out, those columns. Also, I want age ranges, and not total. Finally, I want it to be divided into somatic and psychiatric. All of these will be filtered out below.
# A tibble: 6 × 12
sex age sector measure `2017` `2018` `2019` `2020` `2021` `2022`
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Mænd 0-17 år Somatik Persone… 2.17e+5 2.16e+5 2.19e+5 2.07e+5 2.06e+5 2.15e+5
2 Mænd 0-17 år Somatik Persone… 3.64e+1 3.63e+1 3.69e+1 3.5 e+1 3.49e+1 3.62e+1
3 Mænd 0-17 år Somatik Ophold … 9 e-1 9 e-1 9 e-1 9 e-1 9 e-1 9 e-1
4 Mænd 0-17 år Somatik Persone… 2.06e+5 2.06e+5 2.09e+5 1.98e+5 1.97e+5 2.06e+5
5 Mænd 0-17 år Somatik Persone… 3.45e+1 3.46e+1 3.53e+1 3.36e+1 3.34e+1 3.47e+1
6 Mænd 0-17 år Somatik Ophold … 8 e-1 8 e-1 8 e-1 8 e-1 8 e-1 8 e-1
# ℹ 2 more variables: `2023` <dbl>, `2024` <dbl>
This leaves a dataset divided into the following groupings:
And then subvariables for each of the above for:
# A tibble: 9 × 1
measure
<chr>
1 Personer med ophold (antal)
2 Personer med ophold (pct.)
3 Ophold per person (antal)
4 Personer med ophold på under 12 timer (antal)
5 Personer med ophold på under 12 timer (pct.)
6 Ophold på under 12 timer per person (antal)
7 Personer med ophold på 12 timer eller derover (antal)
8 Personer med ophold på 12 timer eller derover (pct.)
9 Ophold på 12 timer eller derover per person (antal)
And, because it is there, and I do not miss out on any learning by translating the data to english via GPT, I’m gonna put in and example dataset and ask GPT to translate all values in Danish, to English. And ofcourse just do it by hand the few places where simple letters can be changed.
# Hand translate the easy ones
df4 <- df3 |>
mutate(sex = str_replace_all(sex,"Mænd","Men"),
sex = str_replace_all(sex,"Kvinder","Women"),
age = str_replace_all(age,"år","years"),
age = str_replace_all(age,"og derover",""),
age = str_replace_all(age,"60 years","60+ years"),
sector = str_replace_all(sector,"Somatik","Somatic"),
sector = str_replace_all(sector,"Psykiatri","Psychiatry"),
sector = str_replace_all(sector,"Både somatik og psykiatri","Both somatic and psychiatry")
)
# GPT aided translation of the measure
# clean text
df5 <- df4 |>
mutate(
measure = str_squish(measure) # trim & collapse whitespace
)
# Define a named lookup (DA -> EN)
da_en <- c(
"Personer med ophold (antal)" = "Persons with stays (number)",
"Personer med ophold (pct.)" = "Persons with stays (percent)",
"Ophold per person (antal)" = "Stays per person (number)",
"Personer med ophold på under 12 timer (antal)" = "Persons with stays under 12 hours (number)",
"Personer med ophold på under 12 timer (pct.)" = "Persons with stays under 12 hours (percent)",
"Ophold på under 12 timer per person (antal)" = "Stays under 12 hours per person (number)",
"Personer med ophold på 12 timer eller derover (antal)"= "Persons with stays of 12 hours or more (number)",
"Personer med ophold på 12 timer eller derover (pct.)" = "Persons with stays of 12 hours or more (percent)",
"Ophold på 12 timer eller derover per person (antal)" = "Stays of 12 hours or more per person (number)"
)
# Translate (keep originals that don’t match)
df6 <- df5 |>
mutate(measure = recode(measure, !!!da_en, .default = measure))I had a hard time understanding the !!! thingy. Its reference page is helpful.
In plain terms, that I can understand, it injects, or splices, x into y, where x is a list and y is something else. Possibly a list, too. In the above example, the “lookup” object called da_en, is injected into the recode() function. So basically it tells recode() that for its chosen variable, X4, it can take the list as a series of operations it must go through.
Another way, which I am used to and learned before using AI, is just using case_when(). I like that one a lot since it is super easy to understand. When i is the case, set a chosen variable, j to k. It is perhaps better explained as a vectorized if-else statement.
Additionally, I will need to reshape / pivot the data.
# A tibble: 6 × 6
sex age sector measure year value
<fct> <fct> <fct> <fct> <int> <dbl>
1 Men 0-17 years Somatic Persons with stays (number) 2017 217228
2 Men 0-17 years Somatic Persons with stays (number) 2018 216064
3 Men 0-17 years Somatic Persons with stays (number) 2019 218968
4 Men 0-17 years Somatic Persons with stays (number) 2020 207051
5 Men 0-17 years Somatic Persons with stays (number) 2021 206277
6 Men 0-17 years Somatic Persons with stays (number) 2022 214619
So - Now we have a dataset with the following:
That should be clean enough to start putting into an R shiny app.
Let’s see what some of the data looks like before creating a dashboard. We can work with the bigger and simpler numbers first - The amount of people who have stayed in the hospital, by each year.
It doesn’t make much sense to compare the different sectors visually, so lets look at the sectors by themselves to start with, and focus on Sex.
These graphs are very interesting for displaying differences between the sexes, in terms of stays in hospital.
Notable results:
And this is only the counts. What about all the other measures?:
In my estimation, a lot of info can be gleaned just from setting up a dashboard that lets you pick between these different measures, with the three plots I made above.
Loading required package: shiny
Loading required package: bslib
Attaching package: 'bslib'
The following object is masked from 'package:utils':
page
shiny bslib
TRUE TRUE
With the R shiny packages loaded (bslib just lets you customize the R shiny dashboard more), it is time to start setting up the dashboard.
It seems like the example that is structured in the bslib github fits what I want to create as an initial dashboard:
So let’s go. With R shiny you can make dashboards which update based on your input. Shiny code, producing a dashboard, basically consists of:
You can host the dashboard locally or by connecting with it on Shiny apps website, which requires a user.
I will try to make it here, and make it available on that website, or here in the blogpost if that is possible.
First, I take code that fits, from the bslib github mentioned above that fits what I want:
data(penguins, package = "palmerpenguins")
ui <- page_sidebar(
title = "Penguins dashboard",
sidebar = sidebar(
title = "Histogram controls",
varSelectInput(
"var", "Select variable",
dplyr::select_if(penguins, is.numeric)
),
numericInput("bins", "Number of bins", 30)
),
card(
card_header("Histogram"),
plotOutput("p")
)
)
server <- function(input, output) {
output$p <- renderPlot({
ggplot(penguins) +
geom_histogram(aes(!!input$var), bins = input$bins) +
theme_bw(base_size = 20)
})
}
shinyApp(ui, server)Alright. That actually just works! It seems to have opened in R studio, so must be hosted locally.
Now, I have to adapt it to my needs:
Here we go.
# Cards
cards <- list(
card(
full_screen = TRUE,
card_header("Somatic sector"),
plotOutput("soma")
),
card(
full_screen = TRUE,
card_header("Psychiatric sector"),
plotOutput("psych")
),
card(
full_screen = TRUE,
card_header("Both sectors"),
plotOutput("somapsych")
)
)
measure <- selectInput(
"measure", "Measure",
choices = sort(unique(df7$measure))
)And then the dashboard.
plot_sector <- function(data, sector_label) {
data_sector <- data |> filter(sector == sector_label)
req(nrow(data_sector) > 0) # show nothing if that sector isn't present
ggplot(data_sector, aes(x = year, y = value, color = age)) +
geom_line() +
geom_point() +
facet_wrap(~ sex, nrow = 2) +
theme(
plot.title = element_blank(),
legend.position = "bottom",
) +
labs(
title = sector_label,
x = NULL, color = "Age",
y = NULL
) +
theme_bw(base_size = 12)
}
measures <- df7 |>
distinct(measure) |>
arrange(measure) |>
pull(measure)
ui <- page_sidebar(
title = "Hospital stays — dashboard",
sidebar = tagList(
selectInput("measure", "Measure", choices = measures, selected = measures[1]),
helpText("All plots update to the selected measure."),
helpText("Lines = time"),
helpText("Color = age"),
helpText("Facets = sex"),
helpText("Data has been supplied by Statistics Denmark (table SBR04), which contains aggregate data on hospital usage in Denmark between 2017-2024.")
),
# 3 cards side-by-side (wraps on narrow screens)
layout_columns(width = 1,
card(
full_screen = TRUE,
card_header("Somatic"),
plotOutput("plot_somatic", height = 300)
),
card(
full_screen = TRUE,
card_header("Psychiatric"),
plotOutput("plot_psychiatric", height = 300)
),
card(
full_screen = TRUE,
card_header("Both"),
plotOutput("plot_both", height = 300)
)
)
)
server <- function(input, output, session) {
# Filter once by measure; reuse for all sectors
dat_measure <- reactive({
req(input$measure)
df7 |> filter(measure == input$measure)
})
output$plot_somatic <- renderPlot(plot_sector(dat_measure(), sector_label = "Somatic"))
output$plot_psychiatric <- renderPlot(plot_sector(dat_measure(), sector_label = "Psychiatry"))
output$plot_both <- renderPlot(plot_sector(dat_measure(), sector_label = "Both somatic and psychiatry"))
}
shinyApp(ui, server)
Listening on http://127.0.0.1:7617
Fantastic!!! It is now created. But how will it be viewed on the blog? The guide on shinyapps.io details that we need to install rsconnect, authorize account, and then deploy. See the Getting Started page of shinyapps.io when you login. Or if using Rstudio, it can just be deployed with the “publish” button.
But I am confused about how an app will be displayed on this blog.
The simplest solution seems to be a link within the blog post to the shiny dashboard, hosted on the shinyapps.io servers.
Below is the link to the resulting dashboard. Purpose complete!
#| label: Summary chunk
#| echo: true
#| output: false
# Packages
ipak <- function(pkg){
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg))
install.packages(new.pkg, dependencies = TRUE)
sapply(pkg, require, character.only = TRUE)
}
ipak_list <- c("tidyverse", "here", "shiny", "bslib")
ipak(ipak_list)
# Load data
df <- read_delim("insert file name",
delim = ";",
locale = locale(encoding = "ISO-8859-1"),
col_names = FALSE
) |>
rename(sex = X1, age = X2, sector = X3, measure = X4)
# Prepping for long data cleaning chain
year_labels <- c("2017","2018","2019","2020","2021","2022","2023","2024")
names(df)[5:12] <- year_labels
da_en <- c(
"Personer med ophold (antal)" = "Persons with stays (number)",
"Personer med ophold (pct.)" = "Persons with stays (percent)",
"Ophold per person (antal)" = "Stays per person (number)",
"Personer med ophold på under 12 timer (antal)" = "Persons with stays under 12 hours (number)",
"Personer med ophold på under 12 timer (pct.)" = "Persons with stays under 12 hours (percent)",
"Ophold på under 12 timer per person (antal)" = "Stays under 12 hours per person (number)",
"Personer med ophold på 12 timer eller derover (antal)"= "Persons with stays of 12 hours or more (number)",
"Personer med ophold på 12 timer eller derover (pct.)" = "Persons with stays of 12 hours or more (percent)",
"Ophold på 12 timer eller derover per person (antal)" = "Stays of 12 hours or more per person (number)"
)
# Data cleaning chain
df <- df |>
filter(sex != "Køn i alt",
age != "Alder i alt",
sector != "Uanset sygehusvæsen") |>
mutate(sex = str_replace_all(sex,"Mænd","Men"),
sex = str_replace_all(sex,"Kvinder","Women"),
age = str_replace_all(age,"år","years"),
age = str_replace_all(age,"og derover",""),
age = str_replace_all(age,"60 years","60+ years"),
sector = str_replace_all(sector,"Somatik","Somatic"),
sector = str_replace_all(sector,"Psykiatri","Psychiatry"),
sector = str_replace_all(sector,"Både somatik og psykiatri","Both somatic and psychiatry")
) |>
mutate(across(where(is.character), as.factor),
measure = str_squish(measure),
measure = recode(measure, !!!da_en, .default = measure),
value2 = value / 1000
)
# Pilot Plot
df |> filter(
measure == "Persons with stays (number)",
sector == "Both somatic and psychiatry" ) |>
ggplot(aes(year, value2, color = age)) +
geom_line() +
facet_wrap( ~ sex) +
labs(x = NULL, y = "Count in 1000s")
# Shiny app
cards <- list(
card(
full_screen = TRUE,
card_header("Somatic sector"),
plotOutput("soma")
),
card(
full_screen = TRUE,
card_header("Psychiatric sector"),
plotOutput("psych")
),
card(
full_screen = TRUE,
card_header("Both sectors"),
plotOutput("somapsych")
)
)
measure <- varSelectInput(
"measure", "Measure",
distinct(df$measure),
selected = NULL
)
plot_sector <- function(data, sector_label) {
data_sector <- data |> filter(sector == sector_label)
req(nrow(data_sector) > 0) # show nothing if that sector isn't present
ggplot(data_sector, aes(x = year, y = value, color = age)) +
geom_line() +
geom_point() +
facet_wrap(~ sex, nrow = 2) +
theme(
plot.title = element_blank(),
legend.position = "bottom",
) +
labs(
title = sector_label,
x = NULL, color = "Age",
y = NULL
) +
theme_bw(base_size = 12)
}
measures <- df |>
distinct(measure) |>
arrange(measure) |>
pull(measure)
ui <- page_sidebar(
title = "Hospital stays — dashboard",
sidebar = tagList(
selectInput("measure", "Measure", choices = measures, selected = measures[1]),
helpText("All plots update to the selected measure."),
helpText("Lines = time"),
helpText("Color = age"),
helpText("Facets = sex"),
helpText("Data has been supplied by Statistics Denmark (table SBR04), which contains aggregate data on hospital usage in Denmark between 2017-2024.")
),
# 3 cards side-by-side (wraps on narrow screens)
layout_columns(width = 1,
card(
full_screen = TRUE,
card_header("Somatic"),
plotOutput("plot_somatic", height = 300)
),
card(
full_screen = TRUE,
card_header("Psychiatric"),
plotOutput("plot_psychiatric", height = 300)
),
card(
full_screen = TRUE,
card_header("Both"),
plotOutput("plot_both", height = 300)
)
)
)
server <- function(input, output, session) {
# Filter once by measure; reuse for all sectors
dat_measure <- reactive({
req(input$measure)
df |> filter(measure == input$measure)
})
output$plot_somatic <- renderPlot(plot_sector(dat_measure(), sector_label = "Somatic"))
output$plot_psychiatric <- renderPlot(plot_sector(dat_measure(), sector_label = "Psychiatry"))
output$plot_both <- renderPlot(plot_sector(dat_measure(), sector_label = "Both somatic and psychiatry"))
}
shinyApp(ui, server)